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Abstract We have employed NMR to investigate the structure 
of SARS coronavirus nucleocapsid protein dimer. We found that 
the secondary structure of the dimerization domain consists of 
five « helices and a f-hairpin. The dimer interface consists of a 
continuous four-stranded [-sheet superposed by two long « heli- 
ces, reminiscent of that found in the nucleocapsid protein of por- 
cine respiratory and reproductive syndrome virus. Extensive 
hydrogen bond formation between the two hairpins and hydro- 
phobic interactions between the B-sheet and the « helices render 
the interface highly stable. Sequence alignment suggests that 
other coronavirus may share the same structural topology. 

© 2005 Published by Elsevier B.V. on behalf of the Federation of 
European Biochemical Societies. 
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1. Introduction 


Severe acute respiratory syndrome (SARS) is the first emerg- 
ing infectious disease in the 21st century with a fatality rate of 
ca. 8% and is caused by a novel coronavirus (CoV) [1]. The 
nucleocapsid protein is a key component of the virus and is 
essential for virus formation. It binds to the viral RNA to form 
a ribonucleoprotein core, which can enter the host cell and 
interact with cellular processes [2—5]. The free protein presum- 
ably exists as a dimer in solution, with the dimerization do- 
main located at the C-terminus [6,7]. We have previously 
defined the structural domains of the SARS-CoV N protein 
[8]. The C-terminal structural domain coincides with the 
dimerization domain identified in previous studies, and our 
biochemical studies showed that it exists as a dimer in solution. 
Denaturation studies have shown that dissociation of the 
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Abbreviations: NP>4g-365, a SARS-CoV nucleocapsid protein fragment 
consisting of residues 248-365; NP»g1_ 36s, a SARS-CoV nucleocapsid 
protein fragment consisting of residues 281-365; PRRSV, porcine 
reproductive and respiratory syndrome virus 


SARS-CoV N protein is coupled with loss of structure, imply- 
ing a structure-dependent mechanism for self-association [9]. 
Understanding this mechanism would not only provide in- 
sights into the viral assembly process, but also identify addi- 
tional targets for drugs to combat SARS through disruption 
of N protein self-association. However, there has been no 
3D structure of the dimerization domain of coronavirus N 
protein published and the underlining principle governing the 
self-association of coronavirus N protein dimer is also un- 
known. We have employed nuclear magnetic resonance 
(NMR) techniques to investigate the structure of the dimeriza- 
tion domain of SARS-CoV N protein. We report our results in 
this communication. 


2. Materials and methods 


2.1. Plasmid construction 

SARS-CoV TW1 strain cDNA clones were kindly provided to us by 
Dr. P.-J. Chen of National Taiwan University Hospital [10]. The a 
SARS-CoV nucleocapsid protein fragment consisting of residues 
281-365 (NP»91_365) and a SARS-CoV nucleocapsid protein fragment 
consisting of residues 248-365 (NP 4g 365) clones were obtained by 
polymerase chain reaction (PCR) on a RoboCycler Gradient 96 (Strat- 
agene, CA, USA) using appropriate primers. The resulting PCR frag- 
ment contained an NcoI site at one end and a BamHI site at the other. 
After restriction enzyme digestion, the resulting fragment was cloned 
into the pET6H plasmid, which contains a His-tag coding region. 
The resultant protein fragment included an extra MHHHHHHAMG 
sequence at the N-terminus. 


2.2. Protein expression and purification 

The fragments corresponding to residues 248-365 (NPo4s 365) and 
281-365 (NP»81_365) of SARS-CoV N proteins were expressed in Esch- 
erichia coli BL21(DE3) strain. Isotopically labeled protein samples 
were prepared by culturing the cells in standard M9 media, supple- 
mented with '"NH,Cl (1 g/L) (For '°N-labeling) and/or u-'*C-glucose 
(2 g/L) (For '°C-labeling) and appropriately labeled Isogro (0.5 g/L) 
(Isotec, OH, USA). Perdeuterated isotopically labeled protein samples 
were prepared by culturing the cells in the same media in D,O (80% 
D,O for samples used in filtered experiments) and supplemented with 
deuterated Isogro and glucose. Deuteration rates for all clones were on 
the order of 85% (65% for samples used in filtered experiments) as mea- 
sured by mass spectrometry. The cells were broken with a microflui- 
dizer and the protein purified through a Ni-NTA affinity column 
(Qiagen, CA, USA) in buffer (50 mM sodium phosphate, 150 mM 
NaCl, and pH 7.4) containing 7 M urea. The protein was then allowed 
to refold by dialysis in liquid chromatography buffer (50 mM sodium 
phosphate, 150 mM NaCl, 1mM EDTA, 0.01% NaN3, and pH 7.4). 
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Fig. 1. (A) '°N-HSQC spectrum of u-!°N-NP a9 365. (B) Summary of the NMR parameters employed for secondary structure prediction. Dots at the 
top indicate residues’ NH protons are protected from deuterium exchange after 24 h. (C) Secondary structure profile of the SARS-CoV N protein. 
The two shaded areas represent the N-terminal and C-terminal structural domains. Secondary structure of the N-terminal domain was adapted from 
Huang et al. [27]. 
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Renatured protein was loaded onto an AKTA-EXPLORER fast per- tope effect on '*C chemical shifts [14]. The assignments of H, and Hg 
formance liquid chromatography (FPLC) system equipped with a Hi- resonances were achieved from analysis of the HBHA(CO)NH spec- 
Load 16/60 Superdex 75 column (Amersham Pharmacia Biotech, trum. H(CC)(CO)NH, CC(CO)NH and HCCH-TOCSY spectra were 
Sweden). Complete Protease Inhibitor cocktail (Roche, Germany) analyzed to obtain side chain assignments. To identify the interface re- 
was added to the purified protein. Protein concentration was deter- gion involved in dimer interactions, F\['°C, '°N]-filtered, F3-'°N-edited 
mined with the Bio-Rad Protein Assay kit as per instructions from and F,['*C, '°N]-filtered, F3-'*C-edited 3D NOESY-HSQC spectra 
the manufacturer (Bio-Rad, CA, USA). The correct molecular weight were obtained using a (u-7H,'°C,!°N)NPoag_365(65% deuteration)/ 
of the expressed protein was then confirmed by mass spectroscopy. NP»48 365 hetero-dimer sample prepared by mixing labeled NP 4g 365 


sample with equal amount of unlabeled NP» 365 sample [15,16]. 
The protein was denatured in 8M urea and renatured by extensive 

Protein samples for NMR experiments contain between 0.5 and _ eee ig Cea carats poe ia tees with 
3 mM protein in NMR buffer (10 mM sodium phosphate buffer, pH ; ane software (Bruker, Germany) 


ae 4: Pe on SGI workstations or NMRPipe on Linux workstations [17]. The 
ae lige tee HIN eae : a ee 'H chemical shift was referenced to DSS at 0 ppm as suggested [18]. 
pentane-5-sulfonate (DSS), 0.01% NaN3, 10% DO and complete pro- PP 88 
tease inhibitor cocktail). All NMR data were acquired at 30 °C on 500, 
600 or 800 MHz Bruker AVANCE spectrometers equipped with a tri- 


2.3. NMR spectroscopy 


ple resonance TXI cryoprobe with an actively-shielded Z-gradient. 2.4. Static light scattering 
Experimental parameters were set as described previously [11,12]. Protein samples in NMR buffer were diluted to a concentration be- 
Sequential backbone resonance assignments for 'H”, '°N, ‘°C, and tween 0.5 and 2 mg/ml. Prior to loading into the cuvette, samples were 
a OF were derived from standard 3D HNCA, HN(CO)CA, HNCO, filtered through a 0.22 um-cutoff filter. Data were acquired on a Dyn- 
HN(CA)CO, CBCANH, and CBCA(CO)NH experiments [13]. aPro MS/X light scattering system equipped with a fixed-angle detector 
(Hy)CmCH-TOCSY experiments were also obtained to correct for iso- (Protein Solutions, NJ, USA) at 4°C. Analysis was carried out on the 
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1258 1149 1089 1233 130.0 31.31 20.33 2247 21.55 2460 — 


1H Chemical Shift (ppm) 
1H Chemical Shift (ppm) 


cee CONS 61 


Fe SN, Zi Se wo : 
ASST dy FN a Pe 


2 LINAS INT MS NEF th 
B2’ ie 0% ey Ye Nay Ne Seay 
TAS NAN, 1, 


i R320 ~ i Ga22 ! ‘l E324 i i T7326 


a. i Saas 


Fig. 2. (A) Stripe plots showing the intermolecular NOE connectivities in the B-sheet (left panel) and between the side chain resonances of residues in 
the B» strand of one monomer (labeled on top of the stripes) and side chain resonances of residues 1 in E helix in the other monomer (indicated by 
arrows) (right panel). The stripes on the left panel were selected from the Fil C, '°N]-filtered, F3-'°N-edited 3D NOESY-HSQC spectrum and the 
stripes on the the right panel were selected from the F,['°C, '°N]-filtered, F3- 3c. edited 3D NOESY-HSQC. (B) NOE connectivities of the B-sheet 
forming the dimer interface. The shaded arrows and the connecting loops represent the two £ hairpins of the two monomers. The two-headed arrows 
show the observed NOE pairs and the dotted lines are the proposed hydrogen bonds stabilizing the B hairpins, as well as the dimer interface between 
the two B hairpins. The dotted rectangular boxes represent the positions of the two helices which interact with the four-stranded B-sheet. The boxed 
residues are those involved in hydrophobic interaction with the helices. The NOEs between £1 and £2 (also B1’ and £2’ in the other monomer) were 
obtained from 3D '"N-NOESY-HSQC spectrum of u-'°N- NP ogg 365, sa mple and the interfacial NOEs between B2 and £2’ were obtained from '°N- 
filtered 3D NOESY-HSQC spectrum of sample containing u-(-H, ‘°C, '°N)-NPo4g_365(65% Deuteration)/unlabeled NP248 365 hetero-dimer. 
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Dynamics V6 program suite included in the system on an IBM PC- 
compatible computer. Concentration effects were corrected within 
the program. 


2.5. Chemical cross-linking 

The homobifunctional amine cross-linker disuccinimidyl suberate 
was purchased from Sigma—Aldrich (MO, USA) and prepared in 
N,N-dimethylformamide (DMF) to a concentration of 25 mg/ml. 
Reactions were carried out with a final protein concentration of 
0.35 mM and final cross-linker concentration of 5mM. Mock reac- 
tions were set up as control which contained only the protein solution 
and DMF without cross-linker. The reaction mixtures in NMR buffer 
were allowed to react for 1 h at 4°C prior to quenching with 100 mM 
glycine. The results were visualized on SDS-PhastGel minigels (Phar- 
macia Biotech, Sweden). 


3. Results and discussion 


3.1. Secondary structure of the dimerization domain 

We have previously shown that SARS-CoV N protein con- 
sists of two structured domains, the RNA binding domain 
(a.a. 45-181) and the dimerization domain (a.a. 248-365), with 
the remainders of the sequence existing in disordered state [8]. 
NP48 365 1S the most stable domain which retains the dimer 
structure. Shortening the fragment causes structural changes 
and lengthening the fragment has no effect on the structure 
of NP 4s 365. Backbone assignment for most amino acids of 
NP»48 365 Was achieved except for residues located at the N- 
terminus and H301 (Fig. 1A). Perdeuteration of NPoag 365 
was necessary to obtain triple-resonance spectra due to short 
T> (transverse relaxation time) of the dimer. The secondary 
structure of NP»o4g 365 was determined from standard NMR 
parameters, such as the characteristic NOE patterns, the con- 
sensus chemical shift indices (CSI), the magnitude of the * as 
value, and the H™ exchange rates (Fig. 1B). The result shows 


A SARS-CoV £ 


Cc 
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that NP»4g 365 consists of five a-helices (A, Val271-Phe274; 
B, Gln290-GIn295; C, Trp302-Phe308; D, Ala312-Gly317 
and E, Phe347-Ala360) and two 6 strands ($1: Arg320- 
Thr326; 62: Gly329-Leu340) (see Fig. 1C). 


3.2. The dimer interface is composed of a f-sheet stabilized by 
helix E 

The short T> of the NPo4g 365dimer (MW = 28 kD) pre- 
cluded full assignment of side chain resonances due to weak 
signals even on cryoprobe-equipped spectrometers, thus ham- 
pered complete 3D structure determination of the dimer to 
high resolution. However, most H,, Hg and aliphatic side- 
chain nuclei could be assigned from (H,,)C,,CH-TOCSY, 
HCCH-TOCSY and '*C-edited NOESY spectra. Further anal- 
ysis of the intramolecular dimer-interface NOEs identified a 
number of contacts between the two £ strands, which allowed 
us to define the 8 hairpin structure (Fig. 2A). This information 
allowed us to manually assign the intermolecular NOEs at the 
dimer interface from analysis of the Fil ’C 'SN]-filtered, 
F;3-!°N-edited and F,['°C, '°Nj-filtered, F3-'*C-edited 3D 
NOESY-HSQC spectra using u-(7H,'°C,!°N)NPoag 365(65% 
deuteration)/unlabeled-NP 48 365 sample. Our results indicate 
that the dimer interface is composed of a continuous four- 
stranded f-sheet, formed by extensive hydrogen bond interac- 
tions between the two long £ strands of the two f-hairpins, 
contributed one from each of the two monomers (Figs. 2B 
and 3A). The dimer is further stabilized by hydrophobic inter- 
actions between residues on one side of the amphipathic long 
helix E and the B-sheet (Fig. 3B). The presence of the extensive 
interactions between the two monomers in a dimer provides 
strong stabilization force and explains why the two monomers 
cannot be separated without denaturing the protein, even 
though there is no cysteine in the dimerization domain to form 
a covalent disulfide bond. This arrangement is reminiscent of 


Fig. 3. (A) Schematic representation of the structure of the dimer interface of SARS-CoV N protein. The relative orientation between the anti- 
parallel B-sheet and the E helix is defined by the six NOEs identified as shown on Fig. 2A. Residues involved in these NOEs are shown in stick and 
ball representations. (B) Helical wheel plot of helix E, showing the amphipathic nature of the helix. The hydrophobic face is defined by the four 
hydrophobic residues (colored green). (C) Ribbon representation of the structure of the dimer interface of the C-terminal domain of the nucleocapsid 
protein of porcine reproductive and respiratory syndrome virus (PRRSV) (PDB ID: 1P65). (D) Ribbon representation of the structure of the dimer 
interface of the capsid protein of bacteriophage MS2 (PDB ID: 1AQ3). The ribbon representations are prepared with the MOLMOL program. 
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the dimer-interface of the porcine reproductive and respiratory 
syndrome virus (PRRSV) nucleocapsid protein (Fig. 3C) [19], 
the coat protein of bacteriophage MS2 (Fig. 3D) [19] and the 
peptide recognition domain of the human histocompatibility 
antigen (HLA) [20,21]. 


3.3. The stable dimer interface is an ideal common building block 
for dimer interfaces 

It has been postulated that although coronaviruses are evolu- 

tionary related to arteriviruses, the large size discrepancy of 
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their nucleocapsid proteins most likely implied that they had 
different folds [22]. However, we show that there are common 
principles that underlie the architecture of a nucleocapsid pro- 
tein in both SARS-CoV and PRRSV. They both contain two 
regions, one for RNA-binding and the other for dimerization 
or oligomerization [8,23], albeit the two domains in SARS- 
CoV N protein is linked by a much longer flexible linker of 
~120 a.a. Most importantly, the structures of the dimer 
interface of the two viruses are very similar. The presence of 
extensive interactions in the dimer interface may render this 
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Fig. 4. (A) Light scattering results of NPo4g_ 365 and NP 9j_365. Estimated particle radii and molecular weights are listed. (B) Chemical cross-linking 
of NP54 365 (lanes 1 and 2) and NP¢1_ 365 (lanes 3 and 4). Lanes | and 3: without cross-linker. Lanes 2 and 4: with cross-linker. (C) '"N-edited HSQC 


spectra of NPoag 365 (left) and NPo1_365 (right). 
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Fig. 5. Alignment of the amino acid sequences of various coronavirus N proteins. The alignment shows only the regions corresponding to the dimer 
interface region of SARS-CoV. From top to bottom: SARS-CoV, porcine transmissible gastroenteritis virus (TGEV), feline coronavirus (FCoV), 
human coronavirus strain 229E (HCoV 229E), bovine coronavirus (BCoV), human coronavirus strain OC43 (HCoV OC43), porcine 
hemagglutinating encephalomyelitis virus (PHEV), murine hepatitis virus 1 (MHV-1) and avian infectious bronchitis virus (IBV). JPred secondary 
structure predictions of the sequences are shown below the sequences. E and H represent the predicted secondary structure of a particular amino acid 


as B-strand or o helix, respectively. 
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dimer interface to become self-contained, 1.e., it is less likely to 
be dependent on the global structure of the protein. In fact, we 
have found that a.a. NP»; 365 still retained dimerization 
potential, even though structural differences were evident from 
NMR spectra (Fig. 4). Similar observations have been reported 
in the literature [6]. It is evident that the tight dimer interface 
structure permits certain extended perturbation in the global 
structure without affecting the dimer structure. Such character- 
istics make this particular fold ideal as a common building 
block for dimer interfaces in a variety of proteins. 


3.4. The dimerization mechanism may be common among 
coronavirus nucleocapsid proteins 

To investigate whether all coronavirus N proteins share this 
dimerization mechanism we have used ClustalX to align the 
sequences of other coronaviruses [24] and the resulting sub- 
sequences were then submitted to the JPred server for second- 
ary structure prediction [25]. Sequence alignment coupled with 
secondary structure prediction show that many share the Bpa 
topology observed in SARS-CoV (Fig. 5). In particular, the 
long B-strand and the long C-terminal helix are predicted to 
be present in all cases. Most of them also contain the short 
B-strand, with the exceptions of BCoV, HCoV and PHEV. 
These results raise the possibility that all coronavirus employ 
the same interface mechanism for dimerization and they be- 
long to the same structural class, however this cannot be ver- 
ified by the class-dependent prediction algorithm because of 
the lack of known tertiary structure [26]. 

In conclusion, we have determined the secondary structure of 
the dimerization domain of SARS-CoV N protein and have 
mapped out the residues involved in the interface. We show that 
the interface of SARS-CoV N protein dimer is a four-stranded 
B-sheet, superposed by two long helices. The topology closely 
resembles that of the PRRSV nucleocapsid protein and the coat 
protein of bacteriophage MS2. This type of dimer interfaces is 
highly stable and could serve as one of the common building 
blocks for dimer interfaces in nature. Sequence alignment and 
secondary structure prediction suggest that other coronavirus 
N proteins also adopt a similar dimerization mechanism. 
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